Managing the Exploration/Exploitation Trade-Off in Reinforcement Learning
نویسندگان
چکیده
In this work, we present a model that integrates both exploration and exploitation in a common framework. First of all, we define the concept of degree of exploration from a state as the entropy of the probability distribution on the set of admissible actions in this state. This entropy value allows to control the degree of exploration linked to this state, and should be provided by the user. Then, we restate the exploration/exploitation problem as a global optimization problem: define the best exploration strategy that minimizes the expected cumulated cost, while maintaining fixed degrees of exploration. This formulation leads to a set of nonlinear updating rules reminiscent from the “value iteration” algorithm. Interestingly enough, when the degree of exploration is zero for all states (no exploration), these equations reduce to Bellman’s equations for finding the shortest path while, when it is maximum, a full “blind” exploration is performed. We further show that if the graph of states is directed and acyclic, the nonlinear equations can easily be solved by performing a single backward pass from the destination state. The theoretical results are confirmed by simple simulations showing that the model behaves as expected.
منابع مشابه
Feature Selection Based on Reinforcement Learning for Object Recognition
This paper presents a novel method that allows learning the best feature that describes a given image. It is intended to be used in object recognition. The proposed approach is based on the use of a Reinforcement Learning procedure that selects the best descriptor for every image from a given set. In order to do this, we introduce a new architecture joining a Reinforcement Learning technique wi...
متن کاملDopaminergic Control of the Exploration-Exploitation Trade-Off via the Basal Ganglia
We continuously face the dilemma of choosing between actions that gather new information or actions that exploit existing knowledge. This "exploration-exploitation" trade-off depends on the environment: stability favors exploiting knowledge to maximize gains; volatility favors exploring new options and discovering new outcomes. Here we set out to reconcile recent evidence for dopamine's involve...
متن کاملUsing Confidence Bounds for Exploitation-Exploration Trade-offs
We show how a standard tool from statistics — namely confidence bounds — can be used to elegantly deal with situations which exhibit an exploitation-exploration trade-off. Our technique for designing and analyzing algorithms for such situations is general and can be applied when an algorithm has to make exploitation-versus-exploration decisions based on uncertain information provided by a rando...
متن کاملA Comparison of Exploration/Exploitation Techniques for a Q-Learning Agent in the Wumpus World
The Q-Learning algorithm, suggested by Watkins [1], has become one of the most popular reinforcement learning algorithms due to its relatively simple implementation and the complexity reduction gained by the use of a model-free method. However, QLearning does not specify how to trade off exploration of the world for exploitation of the developed policy. Multiple such tradeoffs are possible and ...
متن کاملRealworld Robot Navigation by Two Dimensional Evaluation Reinforcement Learning
The trade-off of exploration and exploitation is present for a learnig method based on the trial and error such as reinforcement learning. We have proposed a reinforcement learning algorism using reward and punishment as repulsive evaluation(2D-RL). In the algorithm, an appropriate balance between exploration and exploitation can be attained by using interest and utility. In this paper, we appl...
متن کامل